SPECIAL REPORT ON MICROPROCESSORS/MICROCOMPUTERS PART II 


THIRTY-TWO 


BIT MICROS 


POWER WORKSTATIONS 


New ‘‘star’’ processors require strong supporting casts and 
clever designs to be effective engines in engineering 


workstations. 


by Nicolas Mokhoff, 
Senior Editor 


With system engineers just beginning to reap the 
benefits of their 16-bit microprocessor-based designs, 
and 32-bit chips about to be integrated into new 
designs, a complacent observer might conclude that 
there will soon be an IBM/370 mainframe-equivalent 
computer on every professional’s desk. Yet, behind 
this ‘‘ethereal’’ work environment is a more realistic 
prognosis. The mainframe-per-desk dream will only 
materialize when all components of this powerful 
workstation perform as members of one orchestra 
in a symphony of information processing and 
communication. 

To make successful ‘‘data music’’ together, 16- 
and 32-bit microprocessors must perform in concert 
within a system architecture. No matter what out- 
standing features a sophisticated microprocessor 
possesses, the chip by itself contributes little to the 
workstation’s overall performance. It must also be 
incorporated on an efficient bus architecture and sur- 
rounded with proper hardware and software support. 

Moreover, high performance workstations usually 
embody more than one “‘star’’ microprocessor. In 
an efficient multitask, multiprocess workstation, 
various 8-, 16-, and 32-bit separate processors are 
used for such data-intensive functions as graphics, 
high speed 1/0, diagnostics, and communications. 
Thus, this multiprocessing capability must be accom- 
modated within the designer’s system specification— 
a tricky task, to say the least. 


The current frenetic pace in defining 32-bit 
microprocessor bus specifications lends credence to 
the notion that design engineers need more than 
glorified VLSI chips to make their systems really 
work. Thus, both chip manufacturers and users have 
strongly identified their marketing strategies and 
applications with one of four open-system bus 
schemes, such as Multibus Il, VMEbus, NuBus, and 
Futurebus, or with some kind of proprietary bus. 
These proprietary 32-bit bus schemes are closely 
related to a particular hardware configuration and 
are therefore limited in appeal to the entire design 
community. 

One such ‘‘closed’’ open-system bus scheme is in 
the works at Digital Equipment Corp (Maynard, 
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Syte Technology’s first workstation, the model 300, displays 
the kind of power now available to the design engineer. It 
is designed for engineering-oriented applications including 
software development, computer aided engineering, and 
project management. The multiple microprocessor hardware 
lets users make fast floating point calculations on a DEC 
VAX-11/750 supermini. The system allows multiple operating 
systems to run concurrently on one mode; simultaneously 
on multiple computers on the network. 


Mass). The company is developing a 32-bit bus 
architecture for a whole family of new 32-bit 
machines. If history is a guide, DEC should have 
wide success in promulgating a whole new board 
industry on the par of its Q-bus and Q-22 compatible 
board business. However, because those boards are 
intimately tied to DEC computers, they remain of 
interest only to that particular segment of the design 
community. 

For the rest of the design world, choosing among 
available open-system buses can be a harrowing 
experience. Of course, advocates of the respective 
schemes will defend their bus to the death based on 
worthy technical merits. However, they will also 
readily admit that such factors as marketing, and 
compatibility with an installed base of products that 
sport their choice bus’s 8- and 16-bit predecessors, 
are critical in the selection process. 

The bus standardization issue is, in itself, a moving 
target. It is best, therefore, to let the actual stan- 
dards documents supply the sufficient information 
on the individual buses (see reference section, 
‘*Obtaining bus specifications,’’ on p 112). In light 
of recent developments, it is perhaps more beneficial 
to concentrate on several arbitrary hardware and 
software choices from manufacturers of sophisticated 
workstations. In all cases, the companies choose a 
processor, system architecture, software environ- 
ment, and communication capability to produce a 
single-user networked workstation that suits a par- 
ticular set of specifications for a well-targeted 
market. Among the workstations highlighted are 
those from Syte Information Technology (San Diego, 
Calif), Sun Microsystems (Mountain View, Calif), 
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Apollo Computer, Inc (Chelmsford, Mass), Tek- 
tronix (Wilsonville, Ore), Hewlett-Packard (Palo 
Alto, Calif), Jupiter Systems, Inc (Berkeley, Calif), 
and Saber Technology Corp (San Jose, Calif). This 
analysis offers a general perspective on both the com- 
ponents and the thinking process needed to optimize 
the new 32-bit micros. 

In the case of Syte Information Technology, 
company principals went out of their way to endorse 
the National Semiconductor (Santa Clara, Calif) 
NS32032 when that chip was introduced. The 
microprocessor became the CPU for the company’s 
first product—the model 300 workstation. Aside 
from the publicity benefits for both companies, the 
endorsement also yielded some interesting insights 
into the technical evaluations done by Syte engineers. 
According to Syte system architect, Michael Fischer, 
‘*For the first time, a single chip [NS32032] incorpor- 
ates many features that were previously associated 
with mainframe machines.”’ 


An efficient chip architecture 

Two key features of the NS32032 are the symmetry 
of its instruction set and the ability to efficiently run 
programs written in high level languages. The in- 
struction set’s orthogonality allows all instruction 
types to operate on all addressing modes and data 
types. This makes it relatively easy to construct 
efficient programs in either assembly language or 
high level languages. And, because special cases need 
not be used to any great extent, portability is extended 
and programs are easier to debug and modify. 
Moreover, the features allowing efficient execution 
of high level language programs become even more 
important. This is because virtually all of the soft- 
ware for modern processors (including applications, 
compilers, and operating systems) is now written in 
high level languages. 

Part of this high level language support is provided 
by a module-linkage facility. This facility permits 
independently compiled modules to call each other 
through a mechanism that does not require a linking 
loader. The benefits here are easy use and easy distri- 
bution of utility and support software. According 
to Fischer, the NS32032’s module-linkage facility 
protects proprietary code because source code need 
not be distributed. The modules are encapsulated 
and independent. They can thus be efficiently shared 
using the common-linkage mechanism, regardless of 
the source language in which the module was written. 

Another important feature for high level language 
support is the chip’s set of addressing modes. The 
available modes include stack-frame support, with 
a hardware-frame pointer register; static-data area 
support, with a hardware pointer register; and dual- 
displacement indirect addressing modes with both 
a pre- and post-index for accessing pointers in a stack 
frame or in the static area. Other modes featured 
are external addressing for referencing external 


variables located in other areas; external addressing 
for referencing external variables in other modules; 
and scaled-index mode, which makes array indexing 
much simpler, according to Fischer. 

The many instructions in the NS32032 instruction 
set are uniquely oriented toward efficient, high level 
language support. These instructions include inter- 
active array subscript calculations, array-index 
bounds checking, a multiway branch, and indepen- 
dent division operations for producing quotient and 
remainder. (Most machines take the time to produce 
both quotient and remainder in all cases, even 
though only one is usually needed.) In addition, there 
are string operations that generalize to bytes, words, 
and double words; and bit-field operations that 
enable accessing of variable-length, bit-addressed 
fields anywhere in memory without byte boundary 
and alignment restrictions. There is also a rich set 
of floating point operations to extend the integer 
operations in a consistent manner. 

Moreover, the chip’s memory management archi- 
tecture is patterned after the memory management 
system used on a number of successful large scale 
machines, including the IBM 370 series and DEC VAX. 
Memory management support instructions are tightly 
integrated with the CPU instruction set. Various 
tightly coupled CPU functions on peripheral chips 
can be implemented in a ‘‘custom slave.’’ Instruc- 
tions exist for this slave, and the NS32032 hardware 
user can supply the slave circuitry. 


The four principles of tight coupling 

The chip’s tightly coupled architecture provides 
a firm basis for Syte’s closely coupled system archi- 
tecture in the 300 workstations, as well as in the 
company’s overall network of model 300 work- 
stations—the series 3000. The Syte architecture 
serves as the foundation of a product family that 
the company plans to develop throughout the decade. 
This architecture’s principles define the structure of 
the hardware, software, and user interface. These 
principles can be generically applied to the major- 
ity of workstation architectures. Moreover, these 
principles are incorporated in four main objectives: 
a single-level, network-wide, and demand-paged 
address space; a single, uniform interface between 
all system resources; a configuration that supports 
a wider range of performance levels; and a technology- 
independent system that is economically feasible. 

The first objective, a network-wide address space, 
allows any user (or object) to gain direct access using 
a common mechanism (or any other object or re- 
source) on the Syte network. Bruce Hamilton, the 
company’s vice president of engineering, says that 
a global address space of 64 bits is sufficient to 
accommodate the substantial increases in memory 
and bulk storage that any technology advances might 
provide during this decade. Thus, to accommodate 
the large address space in the network, demand 


paging is used within the address space itself. There, 
it supports virtual memory spaces to create an in- 
dependent address space for each process running 
anywhere in the network. Demand paging permits 
a program, or a combination of programs larger 
than the physical memory, to be run on any node 
in the network. Initially, each process’s virtual 
address space will be 24 bits or 16 Mbytes. Larger 
process address spaces up to 32 bits will be supported 
when the technology is economical, according to 
Hamilton. Demand paging is now a key feature in 
other manufacturer workstations as well. 


The NS32032 chip incorporates many 
features previously associated with 
mainframe machines. 


The second objective, a single, uniform interface 
between all system resources, is provided by an 
operating system kernel called the global environ- 
ment manager (GEM). Files, directories, buffers, 
processes, windows, and devices are all treated as 
objects by the GEM. Different object types are ar- 
ranged in a hierarchy and messages are interpreted 
by the object to which they are sent. The sender of 
a message need only know which object to send the 
message to, and what is to be done with it—nothing 
about how it will be done. 

The GEM creates all objects with a unique identi- 
fier to permit unambiguous communication between 
all objects in a network. An object only interprets 
its Own messages and manages its internal data 
structures. The capabilities of any object can be 
extended or modified without altering any code 
associated with other objects. 

The ability to configure a system to support a wide 
range of performance levels is the company’s third 
objective and is achieved by optimizing the perfor- 
mance of a single node (a standalone workstation). 
Thus, the Syte architecture provides many per- 
formance levels: closely coupled, multiprocessor con- 
figurations for high performance; loosely coupled 
configurations for distributed processing; and multi- 
user configurations for a low cost per station advan- 
tage. The configuration quality is enhanced by using 
standard interfaces for external connections. These 
include RS-232, Multibus (IEEE 796), iSBX (IEEE P959), 
Ethernet (IEEE 802.3), Small Computer System Inter- 
face (American National Standards Institute-x3T9.2), 
and QIC-02/QIC-24. Finally, the fourth objective— 
technology independence—is met by having all 
system software written in a high level language, 
using proprietary high bandwidth internal buses, and 
adhering to industry standard external interfaces. 

The Syte network consists of nodes connected by 
an Ethernet local area network (LAN). The GEM 
operating system runs in a supervisory mode on each 
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node to make the total resources of the network 
available to all users. A processor module in each 
node is connected to the network through its high 
speed I/O subsystem via Intel (Santa Clara, Calif) 
80186 and 8051 interface chips (Fig 1). 

A node is the basic physical entity in the network. 
It consists of one or more of the following: a 
processor module, a memory module, a graphics 
module, and peripheral devices (eg, Winchester 
disks, quarter-inch streaming tape, a floppy disk, 
and a line printer). Each module contains co- 
operating systems that perform the functionally 
distributed tasks. The modules communicate via a 
25-Mbyte/s Sytebus. A node can contain up to four 
copies of each type of module; up to eight modules 
in any combination. 


Bit slicing into a graphics processor 

Thirty-two bit microprocessors are still not power- 
ful and flexible enough to accommodate high resolu- 
tion graphics requirements. Thus, the graphics 
module is designed with a high speed, micropro- 
grammed, bipolar graphics processor (AMD 29116) 
and 1 Mbyte of display memory. Its high resolution 
display output can drive one to eight 1000- x 800- 
pixel monochrome monitors, one 1000- x 8-pixel 
color monitor, or two 1000- x 800-pixel x 4-plane 
color monitors. The graphics processor performs 
display-list interpretation, vector-generation raster 
operations, and full chip-and-pick support. A Syte 
node can contain up to four graphics modules. 

As with other modules, the company’s graphics 
module has a closely coupled architecture. In conven- 
tional systems, interactive graphics display generators 
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Fig 1 A Syte node typifies other 
workstations by containing four 
basic entities: a processor module, 
bona Poa a memory module, a graphics 
iSBX module, and peripheral devices. 
isx While the Ns32032, its peripheral 
memory management unit (MMU) 
Ic, and optional floating point unit 
(FPU) are the ‘‘stars’’ of the 
module, three other processor chips 
are essential supporters. There is 
a 16-bit 80186 for high speed 1/0, 
an 8-bit 8051 for intelligent 
communications, and one other 
8-bit 3051 for system support. 
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usually attach to their host computers as peripherals. 
This type of loosely coupled configuration requires 
that all graphics information be transferred between 
the host and the display generator by programmed 
I/O or DMA transfers. This not only incurs the 1/0 
handling overhead of the host operating system, but 
also imposes a bottleneck on the flow of informa- 
tion between the application program and the display 
generator. As a result, the loose coupling reduces 
performance by increasing the amount of data trans- 
ferred, and requires software and/or firmware whose 
only function is to support the transfer process. 

Syte’s high performance distributed workstations 
are bound by a more tightly coupled display sub- 
system organization. Typically, this tighter coupling 
eliminates the interface circuitry by placing the dis- 
play generator directly on the workstation’s I/O bus. 
This saves the cost of the interface, but does not 
address either excess data transfers or the additional 
software required to support the transfer. 

The architecture of the system’s graphics facilities 
reduces the data transferred and the support soft- 
ware required. The product line has two implemen- 
tations that solve these problems. One is oriented 
for high performance at intermediate cost; the 
second for intermediate performance at low cost. 
Both of these graphics options are closely coupled 
to the rest of the system, and are supported in a 
transparent manner to the application software. This 
is done by using the same objects, messages, and 
external data structures. 

The first implementation uses an optional onboard 
display subsystem on the processor module. This 
subsystem generates a high resolution monochrome 


(1000 x 800) or medium resolution (640 x 480 x 4) 
color display directly from the onboard main 
memory. A separate memory port is used to fetch 
display data from memory without involving the 
local bus, which connects to other subsystems. The 
sole interface is shared by main memory, which all 
of the processing subsystems already use. Cost is 
minimized because not even interboard busing is 
required for the display generator. 

The second implementation is the graphics module, 
which attaches to the 25-Mbyte/s Sytebus. The close 
coupling between the graphics module and the 
processing subsystems on other modules is achieved 
by using the common physical address space of the 
Sytebus. The coupling operates in both directions. 
Image memory on the graphics module is directly 
addressable as system memory. Moreover, the 
graphics processor can directly address any offboard 
memory to access display data structures, font 
descriptions, etc. The display list can be managed 
by the same memory management routines as the 
rest of the system, and without the need to main- 
tain two parallel data structures. 

Because Syte is a relatively new startup company, 
its engineers could design the tightly coupled display 
subsystem from the start to fit within the overall sys- 
tem architecture. But, working for an established 
organization such as Tektronix, which has the con- 
straints of an established product architecture, calls 
for different tactics. Such is the case for Tektronix 
engineers in Wilsonville, Ore concerning their design 
of the 4115B graphics terminal. 


Popping the bottleneck cork 

The design goal of the 4115B is to increase the 
performance level of the 4113 graphics terminal from 
the normally adequate 2-s response time to ‘‘instan- 
taneous’’ response. This requirement is due to user 
exposure to powerful single-unit systems, which 
allow functions to be performed locally and thus 
require realtime response. 

The 4113’s bottleneck appeared because the central 
processor (the 8086) was severely overloaded and the 
vector-generator hardware was idle most of the time. 
To remedy this, Tektronix engineers opt for the bit- 
slice route instead of using a fast 32-bit micro. The 
vector-generator hardware is replaced with a micro- 
programmable bit-slice ‘‘picture processor.’’ This 
allows the lower level graphics-processing functions 
to be off-loaded from the 8086. Along with the bit- 
slice processor, the picture processor contains several 
hardware accelerators for speed-critical tasks. The 
entire ‘‘update’’ is constructed to fit onto two 
standard-sized cards. 

The 4115B picture processor is an instruction-set 
processor that executes programs (display lists) built 
by code running on the 8086. The initial specifica- 
tion of the display-list format was done by software 
engineers who wrote the original 8086 code, and by 
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the micro coders who would implement the instruc- 
tion set. While the specification evolved along with 
the implementation, task partitioning between the 
8086 and the picture processor did not change drasti- 
cally after the first specification. 

As in the 4113 and the 4115B, the 8086 comprises 
the multitasking operating system, host communi- 
cation, peripheral management, and display-list 
management functions. The code in the 4115B, 
however, differs significantly from that of the 4113 
in several key areas. For example, many data paths 
are 32 bits wide in order to support the 32-bit 
coordinate space. Also, new algorithms and data 
structures allow faster, more space-efficient creation 
of many small graphics segments. New code is also 
used to drive the hardware dialogue overlay and 
cursor overlay (not present in the 4113). Moreover, 
an additional 8087 numeric coprocessor is used for 
the wide arithmetic operations needed to generate 
graphics-image transforms for the picture processor. 


One 32-bit workstation uses an 
““ideal’’ combination to produce one 
of the industry’s fastest and highest 
resolution displays. 


The picture processor executes commands from 
a display list that is resident in system memory. It 
transforms graphics primitives, described in a 32-bit 
integer coordinate space, into 1280- x 1024-pixel 
screen-coordinate space, and clips the results to 
rectangular view ports on the screen. It scan-converts 
the transformed primitives and writes pixels into the 
frame buffer. Using information from the display 
list, the picture processor also controls the appear- 
ance parameters (eg, primitive attributes such as 
line style, filled or hollow areas, and background 
transparency of dot-matrix characters). 

Since the picture processor is an independently 
executing processor, it must gain access to the system 
bus and perform data transfers to and from system 
memory and I/O devices. In the 4115B, the details of 
these low level operations are hidden from the 
microcode. Two hardware state machines (both 
resident in a single registered programmable array 
logic IC) implement the bus-acquisition and data- 
transfer protocols. 

A single microinstruction activates the machines. 
The microcode can then continue executing until it 
needs the results of a bus read, or until it tries to 
start another bus operation. At that time, a hard- 
ware “‘wait’’? mechanism temporarily halts the pic- 
ture processor until the original cycle is completed. 
Thus, microcode does not have to test any status 
flags to see if a transfer is completed before start- 
ing another transfer. 


Gisuse 
| 44113 


50,000 
SHORT VECTORS/S 


100 1000 10,000 100,000 


“FROM DISK FILE—NOT IN DISPLAY LIST. 


Fig 2 When Tektronix engineers enhanced the 4113 
graphics terminal with a picture processor and added 
special microcode to the 8086 processor, the result was the 
4115B terminal featuring a 32-bit coordinate space. When 
comparing the system performance of the two terminals in 
terms of the amount of elements that can be drawn per 
second, the 4115B excels considerably. 


To assist in the transformation of points from 


32-bit terminal-coordinate space, the picture pro- 
cessor also features low cost, serial/parallel multiplier 
hardware. Like the bus-transfer circuitry, this hard- 
ware is activated by, and can operate in parallel with, 
microcode. Moreover, to achieve the desired pro- 
cessing speed, up to 48- x 48-bit multiplications are 
performed using multiple passes and partial-product 
accumulation in microcode. 

Testing a design’s value requires a quantitative 
metric against which system performance is mea- 
sured. For the design of the 4115B, senior engineer 
Douglas Doornink and his design team at the 
Tektronix Information Display Division establish 
performance metrics according to the product’s 
target areas. The metrics measure the per-second 
drawing speed of vectors, segments, and simple 
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Fig 3 The performance of the 4115B can be compared to 
the performance of other architectures by using just the 
speed at which short vectors are drawn as the metric. For 
the relatively limited amount of upgrading done on the 4113 
terminal, the 4115B fares well above the Apollo DN420. 
However, it does not approach the speed of a Seillac-7, 
where the operations are performed through a pipeline 
architecture. 
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panels. All vectors and panels have two-dimensional 
transforms applied to them as they are drawn. 

From analyzing sample pictures from typical 
applications, the team determines that the average 
vector length is about 10 pixels on a 1280 x 1024 
display. Pictures in this category have an average 
number of vectors equal to 30,000. Pictures with 
longer pixels have typically fewer vectors. In the 
worst case, an application may have only one vec- 
tor per segment; in the best case, all vectors would 
be in one segment. According to Doornink, because 
the picture processor incurs significant overhead for 
each segment, both cases require benchmarking. 

Another important picture type is that using 
simple panels. A simple panel is defined as a panel 
having fewer than 16 edges. One implementation of 
a simple panel is in a solids model, where a mesh 
description is used and the image is generated with 
many quadrilaterals. For these image types, the 
quadrilaterals have an area of about 100 pixels. 
There are other applications such as VLSI computer 
aided design, in which the panels to be filled are even 
simpler and can be rendered by rectangles. Because 
of this, another benchmark for the rectangle fill 
performance is needed, again using rectangles within 
an area of 100 pixels. 

The performance gains from adding a microcoded 
picture processor are dramatic, as shown in Fig 2. 
Some performance comparisons to four other 
architecture types are also included. Overall, the 
4115B fares well compared to the more conservative 
approaches (Fig 3). 


“Ideal” combination makes a fast display 

For its part, Saber Technology Corp develops its 
32-bit workstation using an ‘‘ideal’’ combination of 
the NS32032, the Unix operating system, and a 
proprietary circuit technology called QSEL. The 
result is one of the industry’s fastest and highest 
resolution displays—a 19-in. video monitor with a 
60-Hz noninterlaced refresh rate that can address 
bit-mapped graphics with a full gray-scale resolution 
of 1664 x 1248 pixels. 

To complement its high resolution monitor, a 
graphics subsystem is designed to support the 
2-million pixel image for ultrahigh resolution. And, 
to meet the needs of three-dimensional design and 
solids modeling, the graphics subsystem architecture 
is based on a proprietary high bandwidth bus 
structure (Fig 4). 

The system provides true bit-mapped graphics 
and allows control of any pixel placement and 
movement. The graphics subsystem can take data 
(eg, a design) directly from mass storage by by- 
passing the CPU, and bring it to the video display 
very quickly under DMA control. The image mem- 
ory unit is comprised of image planes designed to 
the 2-million pixel density of the display. The basic 
subsystem can contain up to eight image planes. 
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Fig 4 An example of 32-bit 
microprocessor power combined 
with innovative circuitry for display 
on a high resolution screen (1664 x 
1248 pixels with a full gray scale) is 
Saber Technology's Interactive 
Graphics Computer System (1GCs). 
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Expansion is available for a total of 24 image planes. 
The image subsystem supports a unique ultrahigh 
speed display interface providing 180-MHz video 
with 24-bit color resolution. Each primary color has 
an effective 8-bit conversion at the very high video 
data rate. The rate results in a pixel time of less 
than 6 ns. 

The NS32032 used in Saber’s workstation runs at 
10 MHz and processes at approximately 1.2 million 
instructions per second (MIPS). The system also has 
a high performance floating point coprocessor. 
Saber has designed its own memory access (DMA) 
system. The DMA has a 20-Mbyte/s transfer rate 
and 8 variable-burst length, dynamically relocatable 
channels. All DMA devices are first in, first out 
(FIFO) buffered. 

Saber’s system, dubbed the Interactive Graphics 
Computer System (IGCS), has two independent disk 
controllers, each on a separate DMA channel with 


eta lal 


An NS32032 running at a 10 MHz 
controls the graphics subsystem, 
which uses a proprietary 
20-Mbyte/s bus and QSEL circuit 
technology as a fast graphics 
buffer. The Berkeley Unix 
operating system, Ethernet, and 
IBM coaxial connections ensure that 
the workstation is a standalone unit 
and an integral part of a computer 
aided engineering network. 


its own disk subsystem. One is allocated to the Unix 
operating system, the other is used for applications. 
Each disk subsystem can support four drives. 

Meanwhile, Apollo Computer adds a 32-bit work- 
station as another node for its proprietary Domain 
network. The company dubs the Domain as a distrib- 
uted processing system designed for both general- 
purpose and interactive graphics applications. 

In essence, the network is a collection of power- 
ful personal workstations and ‘‘server’’ computers 
interconnected by a high speed LAN. Both work- 
stations and server computers can run very large and 
complex applications. The personal workstation is 
provided with a high resolution, bit-mapped display. 
Therefore, each user can display the output of pro- 
grams written in Fortran, C, or Pascal. A Domain 
server processor can act as a file or peripheral 
server, as well as a gateway to other networks. All 
workstations and server processors share a common 


The Apollo DN320 workstation 
distinguishes itself by incorporating 
a standard hardware floating point 
processor. The processor is 
implemented in bit-slice technology. 
This allows structuring of both 32- 
and 64-bit floating point data 
formats that adhere to the IEEE 


standard. As another node in the 
Domain network, the DN320 offers 
each user 1.5 Mbytes of main 
memory, 16 Mbytes of virtual 
address space and a 1024- x 
800-pixel display. A 68010 is used as 
the central processor. 
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Fig 5 Adherence to an open-system architecture and to industry bus standards is a design requirement for the 
Sun-2. Using the IEEE 796 Multibus specifications for the card cage and backplane, the unit divides its tasks between 
the two respective connectors. P1 is used for 1/0 access to a wide variety of peripherals, while P2 provides access for 
high speed data transfers to both main memory and display memory. 


network-wide virtual memory system that allows 
groups of users to share programs, files, and 
peripherals. 

Like the networks from Syte and Sun Micro- 
systems, the Domain station consists of three main 
parts: a powerful processor with a large virtual 
address space, a high speed cache, and a high reso- 
lution, bit-mapped graphics display subsystem. 
Moreover, the workstation supports a full 256-Mbyte 
virtual address space. Thus, applications that exist 
on mainframes or superminicomputers can be con- 
verted to the Domain system with minimal effort, 
according to Dave Nelson, Apollo’s vice president 
of research and development. The structure of 
virtual memory is based on objects that are 32-bit, 
byte-addressable virtual address spaces and acces- 
sible from anywhere in the network. 


It's all part of the domain 

To explain the advantages of object-based sys- 
tems, Nelson comments, ‘‘In conventional time- 
sharing systems, separate mechanisms are frequently 
used to implement similar system functions.’’ For 
example, Nelson points out that programs may be 
managed by a paging system, whereas data files are 
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accessed and handled through a file system. Thus, 
two distinct system mechanisms exist to handle 
similar system entities. In contrast, he notes, a 
Domain-type system deals exclusively with objects 
without regard to their physical location on the 
network or their specific functions. According to 
Nelson, the object abstraction simplifies the over- 
all system design by casing all system entities into 
a common framework and by managing them with 
a common set of mechanisms. 

The DN320, for example, can execute up to 24 
concurrent processes with 16 Mbytes of virtual 
address space per process on the terminal’s multi- 
window display. Standard within the DN320 is a 
floating point unit implemented in microcoded bit- 
slice technology. 

Sun Microsystems’ workstations sport similar 
features. Based on the 68010 and, when available, on 
the 68020 32-bit architecture, the Sun-2 can also use 
up to 16 Mbytes of virtual address space per process. 
As a fairly new startup company, its engineers go 
to excruciating extremes to make sure their design 
adheres to popular standards (Fig 5). This will allow 
future Suns within the same family to communicate, 
and thus stay transparent to technology innovations. 


As with other standalone workstations, the Sun-2 family 
runs on a Unix derivative that allows each station to 
operate without a separate disk storage unit. Instead, 
diskless nodes use the Ethernet network to perform demand 
paging as well as routine file 1/0. The 68010-based 
workstation is designed for easy upgrading to true 32-bit 
computing power when the 32-bit (both address and data) 
68020 chips become available. 


Currently, Sun-2 workstations operate the 68010 
processor with a 10-MHz clock. At this high clock 
rate, most microprocessor systems are limited by the 
speed of their main memory because the CPU must 
incur One or more wait states on every access to 
memory. These states are essentially wasted clock 
cycles. While some designs attempt to relieve this 
problem by introducing expensive high speed cache 
memory access, Sun engineers use a custom memory 
management unit (MMU) that allows the processor 
to access all of main memory without wait states. 
In effect, this turns all of main memory into cache. 

The MMU implements two-level address transla- 
tion for virtual memory operations, providing both 
segment and page addressing. The MMU hardware 
supports multiprocessing by having separate read, 
write, and execute commands for both the operat- 
ing system and the user on every memory page. 
Optimizations for the company’s Unix operating 
system include referenced and modified bits for each 
page of memory to facilitate efficient demand-paging 
algorithms, and hardware support for eight separate 
contexts to facilitate rapid process switching. 

Unix is emerging as the standard operating system 
for this new generation of professional workstations. 
This is largely due to the popularity of the PDP-11 


108 COMPUTER DESIGNIJune 15, 1984 


and the VAX-11—two units responsible for Unix’s 
widespread use. But, Bill Joy, vice president of 
research and development at Sun Microsystems, 
adds a warning to the current Unix euphoria by 
saying, ‘‘It would be naive to assume that Unix is 
unqualifiedly suitable to this personal computing 
environment, or that the changes made in the past 
few years to the most advanced versions of Unix 
were directly applicable to the workstation environ- 
ment.’’ According to Joy, the change from a time- 
sharing environment to a shared-resource personal 
computing environment requires a reexamination of 
the Unix system facilities on these machines. 

Joy joined Sun straight from the University of 
California at Berkeley, where he designed and imple- 
mented the Berkeley version of Unix, called Unix 
4.1 BSD. He later helped develop Unix 4.2 BSD. As 
such, he may be the most qualified to expand on 
the virtues and-limitations of Unix (see Panel, 
‘*Massaging Unix for the workstation’’). 


Variations on the Unix theme 

While Sun Microsystems has placed its stake in 
the Berkeley version of Unix 4.2 BSD, Hewlett- 
Packard is gambling on an enhanced version of 
AT&T Bell Lab’s Unix. The company’s version is 
called HP-UX and is for the HP 9000 series 500 com- 
puters, a series based on the company’s proprietary 
32-bit, three-chip set. HP-UX is a combination of 
AT&T Bell Lab’s Unix, portions of the University of 
California at Berkeley’s Unix implementation, and 
Hewlett-Packard software enhancements. Because 
Unix is easy to implement on a variety of proces- 
sors and computer architectures, Hewlett-Packard 
engineers see it as the ideal choice. Moreover, it is 
compatible to the distinct architecture of current HP 
9000 products: the 68000-based series 200 and the 
company’s 32-bit based series 500 computers. HP-UX 
is also planned for future products. 

Obviously, HP-UX facilitates easy importation of 
Unix-derived programs onto Hewlett-Packard equip- 
ment, and offers users a consistent, powerful pro- 
gram development environment. Complementary 
extensions address the company’s own manufac- 
turer’s productivity network (MPN). It reflects the 
company’s view of how computer systems can be 
used in manufacturing organizations to improve 
productivity. 

Rather than implementing every function of sys- 
tem III Unix, Hewlett-Packard software engineers 
include features that are important for either port- 
ing standard software or for their absolute program 
development value. With these guidelines, a com- 
patibility hierarchy is used in which kernel services 
have the highest priority. Library subroutines have 
the second choice, and commands have the third 
choice. As a result of this approach, HP-UX includes 
all System III kernel intrinsics and all libraries except 
for a handful of graphics subroutines. More than 


*“Massaging”’ Unix for the workstation 


Unix is available today on all major 16-bit and soon- 
to-be-available 32-bit microprocessors, all vying for 
use in high performance workstations. It is the domi- 
nant system on the Motorola 68000 family. And, with 
the forthcoming high performance 32-bit chips, Unix 
will be available on a mature architecture where an 
individual workstation will pack more performance 
than the large time-shared DEC VAX-11/780. This turn 
of events will put new demands on Unix, because it 
needs to change to meet single-user system needs, 
rather than spend its efforts trying to apportion an 
overloaded time-sharing system. 

Bill Joy, vice president of research and development 
at Sun Microsytems, places his bets on a new version 
of the Berkeley 4.2 BSD as the predominant version for 
the 32-bit based workstations. ‘‘The standard ver- 
sions of Unix in use today, Version 7, Xenix [from 
Microsoft], and System tli from Western Electric are 
all similar in facilities,’ says Joy, ‘‘but all of these 
systems lack true interprocess communication primi- 
tives.’’ He notes that while the pipe facility allows 
related processes to communicate in a one-directional, 
byte-stream fashion, the standard Unix versions lack 
facilities for unrelated processes to communicate, and 
for communication over a local network. 


Debugging support 

Joy contends that standard Unix also has very poor 
support for debugging. The only available debugger 
program does not support source-level debugging, but 
requires the inspection of assembly language and low 
level program details in operation. In addition, the stan- 
dard Unix has little support for the smart terminals, 
which have become widely available and quite inex- 
pensive. One must recall that Unix was written when 
the predominant terminals were hardcopy or refresh 
terminals. Programs only adequately supported termi- 
nals with cursor addressing and other features. 

As the principal developer of 4.2 BSD, Joy says that 
this version addresses the various usage problems on 
smart workstations, expecially when these are tied in 
a network. Joy explains, ‘‘The 4.2 BSD system 
includes a number of enhancements to the earlier stan- 
dard systems. First and foremost of these enhance- 
ments is full support for local networking and 
interprocess communications.’’ According to Joy, this 
support provides full access to local network proto- 
cols, and allows unrelated processes to conveniently 
communicate. The system includes abstract models 
of both datagram and socket-oriented communication, 
and allows processes to perform 1/0 operations with- 
out blocking. The processes can thus multiplex input 
from and output to different streams. 

In addition, the standard 4.2 BSD system includes a 
full implementation of the Defense Advanced Re- 
search Project Agency's (DARPA) transmission com- 
munication protocol and Internet interface protocols. 
Other protocol implementations are underway for the 
system. These include support for standard protocols, 
such as X.25. The 4.2 BSD system has a source lan- 
guage debugger (dbx) that replaces the previous 
assembly language expressions, the setting of break- 
points and conditional breakpoints, and other features 
commonly found in other commercial systems. 


The 4.2 BSD system includes a terminal data base, 
first developed for the Berkeley version of Unix Ver- 
sion 7. This data base allows programs to be written 
independent of the terminal type being used. Joy says 
that this approach is quite successful, and descriptions 
of several hundred different terminal types are avail- 
able. This allows Unix to support the wide range of 
CRT terminals found in the market today. The 4.2 BSD 
also includes support for virtual memory management, 
an essential requirement for supporting the large appli- 
cations typical of the VAx/Unix environment. This vir- 
tual memory support also lets Unix support substantial 
Lisp applications. It can also off-load applications from 
the popular PDP-10 engine, for which the PDP-10 
address space has proved to be insufficiently large. 


Improvements still necessary 

With all these enhancements, however, 4.2 BSD 
leaves room for improvement for use on workstations, 
according to Joy. Among the improvements Joy 
would like to see for the single-user environment are 
better access to files as server machines; a better 
memory management facility to accommodate the re- 
quired number of individual jobs on each station; 
improved paging algorithms that will not degrade per- 
formance; and optimized process scheduling. 

Porting to a new processor is one way to improve 
upon 4.2 BSD. That, however, is not an easy task. 
Porting requires modification of the Unix source code 
so that it works on another type of computer system. 

According to Jeffrey Schriebman, president of 
Unisoft Systems (Berkeley, Calif), the most rigorous 
aspect of the task is porting the kernel. This is because 
the kernel interacts with the computer hardware more 
than any other part of the operating system. Any- 
where from 20 to 30 percent of the 15,000 lines of 
kernel source code need to be changed to make Unix 
operational on a new CPU. 

Approximately 10 to 20 percent of the kernel in- 
volves interaction with the memory management unit 
(MMU). This includes process creation, resource allo- 
cation, and process swapping. When a new MMu is 
used for an existing CPU type, this code must be 
redone. 

Another 10 percent of the kernel code involves the 
device drivers. This code generally needs to be 
changed even for a minimal-effort port. The remaining 
portion (approximately 70 to 80 percent) of the Unix 
kernel code should pass through the porting process 
unchanged. This makes most of the source code the 
same for a VAX, a Bell 3820, or a Motorola 68000. 

While porting the Unix kernel is the most demand- 
ing part of the porting task and requires the most 
experience, it is only part of the job, according to 
Schriebman. Porting the 300,000 lines of utility soft- 
ware as well as the innumerable application packages 
presents several subtle difficulties. Also, if the target 
machine uses a new CPU, the source code must be 
checked for machine dependence. 

While the popularity of Unix is increasing dramati- 
cally, so is the proliferation of different Unix versions 
and implementations. ‘’Porter’’ engineers are thus 
faced with as many benefits as limitations when writ- 
ing compatible software between machines. 
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Fig 6 An example of a high resolution workstation designed on the Q-bus is the Jupiter 12. Sporting a 32-bit 68010 
processor, the unit also runs the Unix operating system and connects to an Ethernet. Among the standard graphics 
capabilities incorporated in the single workstation through the power of a 32-bit microprocessor are anti-aliasing 

vectors, solids, and text. Display-list management with three-dimensional transformations into multiple windows is 


also available. 


125 of the most useful System I1l commands and a 
small, but important number of Berkeley 4.2 BSD 
commands are also offered. 

To satisfy customer base enhancements, program- 
ming languages, graphics, database management, 
device and instrumentation I/O, local area network- 
ing, and friendly user interfacing are being standard- 
ized. These extensions, which appear as additional 
kernel intrinsics, libraries, and commands, will 
bridge the gap between the company’s HP-UX and 
non-HP-UX computers. 


A matter of compatibility 

According to Michael Hetrick, project manager 
at Hewlett-Packard’s Loveland, Colo facility, per- 
haps the most critical issue in establishing the future 
course for HP-UX is its degree of compatibility with 
AT&T Bell Lab’s Unix and the Berkeley version. 
Says Hetrick, ‘‘While 4.2 BSD Unix is currently the 
superior version, Bell is developing improved ver- 
sions that could eventually surpass 4.2 BSD in capa- 
bility and reliability.’’ Also, four microprocessor 
manufacturers are in the process of validating Sys- 
tem V, AT&T Bell Lab’s latest Unix version, on its 
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microprocessor products. The four companies are 
Motorola (Phoenix, Ariz), Zilog (Campbell, Calif), 
National Semiconductor, and Intel (see Computer 
Design, Aug 1983, p 113). System V could become 
the most affordable Unix and the Unix of choice for 
portable application programs. 

In light of these factors, Hewlett-Packard engi- 
neers choose the Bell System III version as the base 
standard. The compatibility hierarchy will determine 
which portions of System V and its successors are 
HP-UX candidates. 

Hetrick projects extensions beyond the Bell ver- 
sions if these fail to meet company requirements in 
a timely fashion. However, he says he prefers to 
adopt an existing Unix-based implementation before 
embarking on an original design project. The poten- 
tially rich source of enhancements will most certainly 
come from the University of California at Berkeley 
4.2 BSD version, whose features such as the C shell, 
mailer, and selected kernel intrinsics are expected 
additions. 

Hetrick also says that Microsoft’s (Bellevue, 
Wash) Xenix, with its large intalled base and a 
potentially well-developed source of Unix application 


programs, might also influence the HP-UX operating 
system. This is because both Xenix and HP-UX are 
being selectively enhanced with Bell System v and 
Berkeley features to the same System III definition. 
Thus, conformance between the Xenix and HP-UX 
is likely. 


Although Unix is emerging as the 
standard, its system facilities must be 
reevaluated in light of changes in the 
workstation environment. 


In support of low cost computer systems, Hewlett- 
Packard engineers are also examining methods of 
subsetting HP-UX without sacrificing compatibility 
or an easy growth to the higher performance sys- 
tems. Code-compaction and reduction techniques for 
both the operating system kernel and the disk- 
resident commands are being considered. Under 
investigation is a high performance distributed 
HP-UX operating system that allows individual 
workstations to rely totally on shared-network 
peripherals. Thus, the cost per system is dramati- 
cally reduced, but local processing power is main- 
tained. HP-UX will also be modified to support 
several European languages and the 16-bit Kanji char- 
acter set. Thus, localized application program solu- 
tions will be possible. 

Jupiter Systems Inc’s vice president and director 
of technical support, Peter Harris, places his confi- 
dence in the Berkeley Unix 4.2 in more direct terms 
by stating, ‘‘If you were marooned on a desert island 
and only had one operating system to run on your 
workstation, it would have to be the Berkeley ver- 
sion.’’ Harris was under different design constraints 
when developing the Jupiter 12 color raster graphics 
workstation. Based on a 68010, and using the Q-bus 
as the central data distribution channel, the work- 
station had to have a graphics resolution of 1280 x 
1024 pixels at a 60-Hz refresh rate. The graphics also 
had to be accessible by many users (Fig 6). 

According to Harris, ‘‘It is quite common for a 
team of programmers, all ostensibly working on 
graphics programs, to be able to share a graphics 
display. They all have their own alphanumeric ter- 
minal, of course, but everybody shares the single 
graphics screen display.’’ In a development environ- 
ment, says Harris, a large percentage of clock time 
will be spent on the text editor or compiling and link- 
ing. These cycles typically run from a minimum of 
1 min for a small change, to all day for a major new 
software module. When it comes time to look at the 
graphics screen, adds Harris, it will only take | or 
2 s (at worst case 1 min if things are done slowly) 
to actually draw the picture—the result of all the 
programming. 
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This kind of situation calls for high performance, 
high resolution single-user workstations. To achieve 
high speed and access to every individual pixel, 
Jupiter engineers use a high speed 16-bit ALU for 
coordinate transformation, and assign statements to 
each pixel instead of using function calls. Also, a 
32-bit microprocessor provides at least 21 bits of 
logical addressing for 1280 x 1024 pixels, while an 
image memory port window is mapped to the main 
memory bus. The result is a station, with from 4 to 
32 memory planes, that allows the user to choose 
between a color lookup table system with a 16.7- 
million color palette, or two different RGB options 
with up to 8 bits per color and 8 bits of overlay. 

Thus, workstations currently entering the market 
are ready to serve high performance applications 
from the office desk environment. While 32-bit 
microprocessors play a key role in making worksta- 
tions powerful entities, much remains to be done on 
the periphery to make these micro ‘“‘stars’’ award- 
winning performers. It still remains to be seen who 
has the best supporting cast. 


Obtaining bus specifications 

Readers interested in obtaining the latest versions of open- 
system 32-bit bus specifications should contact the fol- 
lowing individuals: 


Multibus II: 
John Beaston, Intel Corp, 5220 NE Elan Young Pkwy, 
Hillsboro, OR 97123. 


NuBus: 
George White, Texas Instruments, 17881 Cartwright 
Rd, Irvine, CA 92714. 


VMEbus: 
Wayne Fischer, Force Computers Inc, 2041 Mission 
College Blvd, Santa Clara, CA 95054. 


IEEE Futurebus: 
Paul Borrill, University College London, Mullard 
Space Science Lab, Holmbury St Mary, Dorking RH5 
6NT England. 
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